多项式逻辑回归,也以其他名称(例如多类逻辑回归和SoftMax回归)而闻名,是一种基本的分类方法,可将二进制逻辑回归推广到多类问题。最近的一项工作提出了一个更快的梯度,称为$ \ texttt {二次梯度} $,该梯度可以加速二进制逻辑回归训练,并提出了增强的Nesterov的加速梯度(NAG)方法,以进行二进制逻辑回归。在本文中,我们将这项工作扩展到多类逻辑回归,并提出一种增强的自适应梯度算法(Adagrad),该算法可以加速原始的Adagrad方法。我们在某些多类问题数据集上测试了增强的NAG方法和增强的Adagrad方法。实验结果表明,这两种增强方法的收敛速度分别比原始方法更快。
translated by 谷歌翻译
水印是保护创作者对数字图像,视频和音频的权利的常用策略。最近,水印方法已扩展到深度学习模型 - 原则上,当对手试图复制该模型时,应保留水印。但是,实际上,智能对手通常可以去除水印。几篇论文提出了水印方法,这些方法声称对不同类型的拆除攻击具有耐药性,但是在面对新的或更好的对手时,这些新技术通常会失败。在本文中,我们提出了一种可认证的水印方法。使用Chiang等人提出的随机平滑技术,我们表明我们的水印是不明显的,除非模型参数的更改超过一定的L2阈值。除了获得认证外,与以前的水印方法相比,我们的水印在经验上也更强。我们的实验可以在https://github.com/arpitbansal297/certified_watermarks上复制。
translated by 谷歌翻译
在这项工作中,我们提出了一种新颖的矩阵编码方法,该方法对于神经网络特别方便,使用同构加密以隐私性的方式进行预测。基于这种编码方法,我们实施了一个卷积神经网络,以通过加密进行手写图像分类。对于两个矩阵$ a $和$ b $以执行同型乘法,其背后的主要想法是,在一个简单的版本中,分别将矩阵$ a $和矩阵$ b $的转置分别加密到两个密文中。通过其他操作,可以有效地通过加密的矩阵来计算同型矩阵乘法。对于卷积操作,我们提前跨越每个卷积内核到与输入图像相同大小的矩阵空间,以生成几个密文,后来将它们与密文加密输入图像一起使用,以计算一些最终的最终卷积结果。我们积累了所有这些中间结果,从而完成了卷积操作。在具有40 VCPU的公共云中,我们在MNIST测试数据集上的卷积神经网络实现需要$ \ sim $ 287秒,以计算十个可能的32个大小的加密图像$ 28 \ times 28 $同时。数据所有者只需要上传一个Ciphertext($ \ sim 19.8 $ MB),将这32张图像加密到公共云。
translated by 谷歌翻译
多年来,对加密数据的逻辑回归培训一直是对安全问题的一个有吸引力的想法。在本文中,我们提出了一个更快的梯度变体,称为$ \ texttt {二次梯度} $,以在同构加密域中实现逻辑回归训练,其核心可以看作是简化固定固定hessian的扩展。我们使用该梯度变体分别增强了Nesterov的加速梯度(NAG)和自适应梯度算法(Adagrad),并评估了几个数据集中的增强算法。实验结果表明,与幼稚的一阶梯度方法相比,增强的方法在收敛速度方面具有最先进的性能。然后,我们采用增强的NAG方法来实施同型逻辑回归培训,并仅获得3美元的迭代效果。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.
translated by 谷歌翻译
We demonstrate how efficient autonomous drone swarms can be in detecting and tracking occluded targets in densely forested areas, such as lost people during search and rescue missions. Exploration and optimization of local viewing conditions, such as occlusion density and target view obliqueness, provide much faster and much more reliable results than previous, blind sampling strategies that are based on pre-defined waypoints. An adapted real-time particle swarm optimization and a new objective function are presented that are able to deal with dynamic and highly random through-foliage conditions. Synthetic aperture sensing is our fundamental sampling principle, and drone swarms are employed to approximate the optical signals of extremely wide and adaptable airborne lenses.
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译